Binary file

A binary file is a computer file which may contain any type of data, encoded in binary form for computer storage and processing purposes; for example, computer document files containing formatted text. Many binary file formats contain parts that can be interpreted as text; binary files that contain only textual data—without, for example, any formatting information—are called plain text files. In many cases, plain text files are considered to be different from binary files because binary files are made up of more than just plain text. When downloading, a completely functional program without any installer is also often called program binary, or binaries (as opposed to the source code).

Contents

[hide]

Structure

Unlike Text files, there is no special character present in the binary mode files to mark End-of-file. The binary mode files keep track of the end-of-file from number of characters present in the dictionary entry of the file. Binary files are usually thought of as being a sequence of bytes, which means the binary digits (bits) are grouped in eights. Binary files typically contain bytes that are intended to be interpreted as something other than text characters. Compiled computer programs are typical examples; indeed, compiled applications (object files) are sometimes referred to, particularly by programmers, as binaries. But binary files can also contain images, sounds, compressed versions of other files, etc. — in short, any type of file content whatsoever.

Some binary files contain headers, blocks of metadata used by a computer program to interpret the data in the file. For example, a GIF file can contain multiple images, and headers are used to identify and describe each block of image data. If a binary file does not contain any headers, it may be called a flat binary file. But the presence of headers are also common in plain text files, like email and html files.

Manipulation

To send binary files through certain systems (such as e-mail) that do not allow all data values, they are often translated into a plain text representation (using, for example, Base64). Encoding the data has the disadvantage of increasing the file size during the transfer (for example, using Base64 will increase the file's size by approximately 30%), as well as requiring translation back into binary after receipt. The increased size may be countered by lower-level link compression, as the resulting text data will have about as much less entropy as it has increased size, so the actual data transferred in this scenario would likely be very close to the size of the original binary data. See Binary-to-text encoding for more on this subject.

Microsoft Windows and its standard libraries allow the programmer to specify a parameter indicating if a file is expected to be plain text or binary when opening a file; this affects the standard library calls to read and write from the file in that the system converts between the "line break" character (the ASCII linefeed character) and the line break sequence the operating system expects applications to use in files (the linefeed and carriage return characters in sequence). Unix also allows this, but there are no distinctions between text and binary files for the system to make, as Unix only uses a single line feed character to store a line break in files. This reflects the fact that the distinction between the two types of files is to a certain extent arbitrary.

Viewing

A hex editor or viewer may be used to view file data as a sequence of hexadecimal (or decimal, binary or ASCII character) values for corresponding bytes of a binary file.

If a binary file is opened in a text editor, each group of eight bits will typically be translated as a single character, and you will see a (probably unintelligible) display of textual characters. If the file is opened in some other application, that application will have its own use for each byte: maybe the application will treat each byte as a number and output a stream of numbers between 0 and 255 — or maybe interpret the numbers in the bytes as colors and display the corresponding picture. If the file is itself treated as an executable and run, then the operating system will attempt to interpret the file as a series of instructions in its machine language.

Interpretation

Standards are very important to binary files. For example, a binary file interpreted by the ASCII character set will result in text being displayed. A custom application can interpret the file differently, a byte may be a sound, or a pixel, or even an entire word. Binary itself is meaningless, until such time as an executed algorithm defines what should be done with each bit, byte, word or block. Thus, just examining the binary and attempting to match it against known formats can lead to the wrong conclusion as to what it actually represents. This fact can be used in steganography, where an algorithm interprets a binary data file differently to reveal hidden content. Without the algorithm, it is impossible to tell that hidden content exists.

Binary compatibility

Two files that are binary compatible will have the same pattern of zeros and ones in the data portion of the file. The file header, however, may be different.

The term is used most commonly to state that data files produced by one application are exactly the same as data files produced by another application. For example, some software companies produce applications for Windows and the Macintosh that are binary compatible, which means that a file produced in a Windows environment is interchangeable with a file produced on a Macintosh. This avoids many of the conversion problems caused by importing and exporting data.

See also

External links